# Multimodal Audio Understanding
Qwen 2 Audio Instruct Dynamic Fp8
Apache-2.0
Qwen2-Audio is the latest version of the Qwen large audio language model series, capable of receiving various audio signal inputs and performing audio analysis or directly generating text responses based on voice commands.
Text-to-Audio
Transformers English

Q
mlinmg
24
0
Mini Ichigo Llama3.2 3B S Instruct
Apache-2.0
The Ichigo-llama3s series model is a multimodal language model developed by Homebrew Research, natively supporting audio and text input comprehension. Based on the Llama-3 architecture, it is trained using WhisperVQ as an audio file tokenizer, enhancing its audio understanding capabilities.
Text-to-Audio English
M
Menlo
22
34
Featured Recommended AI Models